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Abstract 

The notion of approachability in repeated games with vector payoffs was introduced by 
Blackwell in the 1950s, along with geometric conditions for approachability and correspond¬ 
ing strategies that rely on computing steering directions as projections from the current 
average payoff vector to the (convex) target set. Recently, Abernethy, Batlett and Hazan 
(2011) proposed a class of approachability algorithms that rely on the no-regret properties 
of Online Linear Programming for computing a suitable sequence of steering directions. 
This is first carried out for target sets that are convex cones, and then generalized to any 
convex set by embedding it in a higher-dimensional convex cone. In this paper we present 
a more direct formulation that relies on the support function of the set, along with suitable 
Online Convex Optimization algorithms, which leads to a general class of approachability 
algorithms. We further show that Blackwell’s original algorithm and its convergence follow 
as a special case. 


1 Introduction 

Both Blackwell’s theory of approachability and the no-regret framework of online learning ad¬ 
dress a repeated decision problem in the presence of on an arbitrary (namely, unpredictable) 
adversary. The concept of approachability, introduced in [4], addresses a fundamental feasi¬ 
bility issue in for repeated matrix games with vector-valued payoffs. Referring to one player 
as the agent and to the other as Nature, a set S in the payoff space is approachable by the 
agent if he can ensure that the average payoff vector converges (with probability 1) to S, 
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irrespectively of Nature’s strategy. Blackwell provided in his paper geometric conditions for 
approachability, which are both necessary and sufficient for convex target sets S, and a corre¬ 
sponding approachability strategy for the agent. An extensive recent survey of approachability 
and its implications can be found in [12] . and a textbook exposition is available in m- 

Concurrently, Hannan [7] introduced the concept of no-regret play for repeated matrix games. 
The regret of the agent is the shortfall of the cumulative payoff that was actually obtained 
relative to the one that could have been obtained with the best (fixed) action in hindsight, 
given Nature’s observed actions. A no-regret strategy, or algorithm, should ensure that the 
regret grows sub-linearly in time. The no-regret criterion has been widely adopted during the 
last two decades by the machine learning community as a standard measure for the performance 
of online learning algorithms, and its scope has been greatly extended. Of specific relevance 
here is the Online Convex Optimization (OCO) framework, where Nature’s discrete action is 
replaced by the choice of a convex function at each stage, and the agent’s decision is a point 
in a convex set. The textbook [6] offers a broad overview of regret and online learning. Recent 
surveys of OCO algorithms may be found in [151 9|. 

It is well known that no-regret strategies for repeated games can be obtained as a special case of 
the approachability problem. This was already observed in [3] ; an alternative formulation that 
leads to more explicit strategies was proposed in [8], More recently, it was shown in [1] that 
any no-regret algorithm for the online linear optimization problem can be used as a basis for an 
approachability strategy for convex target sets. The online algorithm is used here compute a 
sequence of steering vectors, that replace the projection directions used in Blackwell’s original 
algorithm. 

The scheme suggested in [1] first considers target sets S that are convex cones. The generaliza¬ 
tion to any convex set is carried out by embedding the original target set in a convex cone in a 
higher dimensional payoff space. The present paper proposes a more direct scheme that avoids 
the above-mentioned embedding. This is done by invoking the support function of the target 
set, along with well-known relations between this function and the Euclidean distance to the 
set. As the support function is convex, the full arsenal of OCO algorithms may be applied to 
provide the required sequence of steering vectors. 

A natural question concerns the relation between Blackwell’s original algorithm and the present 
framework. We first observe that Blackwell’s algorithm is recovered when the standard Follow 
the Leader (FTL) algorithm is used for the OCO part. Establishing the (known) convergence 
of this algorithm via the proposed OCO framework is a bit more intricate. First, when the 
target set has a smooth boundary, we show that FTL guarantees logarithmic rate, which 
’’fast” approachability at a rate of 0(—|^). To address the general case, we further observe 
that Blackwell’s algorithm is still obtained when a regularized version of FTL is employed, 
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from which the standard 0(t 1//2 ) convergence rate may be deduced. 

The paper proceeds as follows. In Section 0 we recall the relevant background on Blackwell’s 
approachability and Online Convex Optimization. Section 0 presents the proposed scheme, in 
the form of a meta-algorithm that relies on a generic OCO algorithm, discusses the relation to 
the scheme of [1], and demonstrates a specific algorithm that is obtained by using Generalized 
Gradient Descent for the OCO algorithm. In Section0]we outline the relations with Blackwell’s 
original algorithm, and provide some concluding remarks. 

Notation: The standard inner product in is denoted by || • || is the Euclidean norm, 
and d(r,S) = inf se s ||r — s|| denotes the corresponding point-to-set distance. Further, B 2 = 
{w € : ||ic|| < 1} denotes the Euclidean unit ball, A (!) is the set of probability distributions 

over a finite set I, diam(S') = sup s s , eS ||s — s'|| is the diameter of the set S, and \\1Z — 5|| = 
sup re7e s£S ||r — s|| denotes the maximal distance between points in the sets 1Z and S. 


2 Model and Background 

We start with a brief of review of Blackwell’s approachability and of Online Convex Program¬ 
ming, focusing on those aspects that are relevant to this paper. 


2.1 Approachability 


Consider a repeated game with vector-valued, rewards that is played by two players, the agent 
and Nature. Let I and J denote the finite action sets of these players, with corresponding mixed 
actions x = (x(l),... , x(|/|)) € A (!) and y = (y(l),... , y(|J|)) € A(J). Let r : I x J —»■ R d be 
the vector-valued reward function of the single-stage game, which is extended to mixed action 
as usual through the bilinear function 


r(x,y) = ^2x(i)y(j)r{i,j). 

Similarly, we denote 

r(x, j) = ^x(i)r(i,j). 
i 


The game is repeated in stages t = 1,2,..., where at stage t actions it and j t are chosen by 
the players, and the reward vector r(it,jt ) is obtained. A pure strategy for the agent is a 
mapping from each possible history (ii,ji, ..., it-i,jt-i) t° an action i t , and a mixed strategy 
is a probability distribution over the pure strategies. Nature’s strategies may be similarly 
defined. 
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As usual, we restrict attention to so-called behavior strategies of the agent, where the action 
it is drawn randomly according to a mixed action xt, using independent draws across stages. 
Furthermore, to simplify the presentation, we shall state our results and algorithms in terms 
of the smoothed reward vectors r{xt,jt), where the reward r(i,t,jt ) is averaged over the mixed 
action xt■ This will allow us to state the results in simpler sample-path terms, rather than 
probabilistic ones; we further discuss this formulation below after Theorem [TJ 


Let 



T 

t =1 


denote the T-stage average reward vector. 


Definition 2.1 (Approachability) A closed set S C is approachable if there exists a 
strategy for the agent and a sequence e(T) —>• 0 such that 

lim d(r T ,S) < e(T) (1) 

T->oo 

holds (w.p. 1) for any strategy of Nature. A strategy of the agent that satisfies this property is 
an approachability strategy for S. 


Theorem 1 (Blackwell, 1956) A closed and convex set S cM. d is approachable if and only 
if either one of the following equivalent conditions holds: 

(i) For each unit vector u E R d , there exists a mixed action x = xs(u ) E A (I) such that 

(u, r{x, j)) < sup {u, s ), for all j E J . (2) 

ses 

(ii) For each y E A (J) there exists x E A (I) such that r(x,y ) E S. 

If S is approachable, then the following strategy is an approachability strategy for S: 

For v ^ S, let us(v) be the unit vector that points to v from Projs(u) ; the closet point to v 
in S. Then, fort > 1, if r t (f S, choose x t+ i = xs(us(r t )); otherwise, choose an arbitrarily 
action. 


The approachability strategy introduced by Blackwell has been generalized in |8j, that essen¬ 
tially allow different norms to be used for the projection unto S. Several recent papers have 
proposed approachability algorithms that depend on Blackwell’s dual condition (condition (ii) 
in the above Theorem) and avoid the projection step altogether (see [2] and references therein). 
The current paper again proposes a generalization of Blackwell’s strategy, but from a different 
viewpoint. 

Let us elaborate on the use of the smoothed rewards r(xt,jt )• This offers several useful benefits: 
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1. As noted, we obtain sample-path bounds rather than probabilistic ones. 

2. We can state results that hold for any sequence (jt), rather than any (mixed) strategy 
of Nature. This is closer to the spirit of Online Algorithms, where the notion of a 
randomized choice by Nature may not be meaningful. 

3. As is well known, the difference Ylt=i r ( x t,jt) — l r (^t,jt) is a Martingale difference 
sequence, hence of order y/T. Thus, the difference in the means is of order -^=, and 
convergence results derived for the smoothed mean are valid for the non-smoothed one 
up to that order. 

We note that the results in [T] are developed for the rewards r(xt, yt), with the mean taken over 
yt as well, and the agent is allowed to observe Nature’s mixed action yt (or at least the mean 
reward r(xt,yt ))• We avoid making that extra step and assume that the agent only observes 
Nature’s pure actions {jt}- 

As the pure actions it of the agent do not affect the rewards r* = r(xt,jt), we may suppress 
them in the following discussion and focus on the mixed actions Xt- In particular, we restrict 
attention to strategies of the agent that assign a mixed action xt to each sequence (j i,... 
of Nature’s actions. (Note that there is no need to include the past mixed actions xi,, xt~\ 
in the history sequence, since they may be computed recursively; in practice, however, we will 
express xt as a function of past the reward vector sequence ( r(xk,jk))k<t •) Since there is no 
randomization involved, it may be seen that Definition 12.11 is equivalent to the requirement 
that the bound (JT]) holds (deterministically) for any sequence (ji,j 2 , ■ ■ ■) of Nature’s actions. 

2.2 Online Convex Optimization (OCO) 

OCO extends the framework of no-regret learning to function minimization. Let W be a 
convex and compact set in M d , and let J- be a set of convex and uniformly bounded functions 
/ : W — > M. Consider a sequential decision problem, where at each stage t > 1 the agent 
chooses a point u>t E W, and then observes a function f t € F. An Algorithm for the agent is a 
rule for choosing wt, t > 1, based on the history {fk,Wk}k<t-i- The regret of an algorithm A 
is defined as 

Regret T (M) = sup 

h,-,f T eT 

where the supremum is taken over all possible functions ft E T. An effective algorithm should 
guarantee a small regret, and in particular one that grows sub-linearly in T. 

The OCO problem was introduced in this generality in [16], along with the following Online 




t =i 


( 3 ) 


t =l 
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Gradient Descent algorithm: 


w t+ i = Proj^Ot - r) t g t ). (4) 

Here gt is an arbitrary element of dft(wt), the subdifferential of ft at wt, (pt) is a diminishing 
gain sequence, and Proj^ denotes the Euclidean projection onto the convex set W. To state 
a regret bound for this algorithm, let diam(TT) denote the diameter of W, and suppose that 
all subgradients of the functions ft are uniformly bounded in norm by a constant G. 


Proposition 2 (Zinkevich, 2003) For the Online Gradient Descent algorithm in 0 with 
gain sequence Vt = V > 0, the regret is upper bounded by 


RegreMOGD) < (1^^ + 2„G?)VT. 

T] 


(5) 


Several classes of OCO algorithms are now known, as surveyed in mmm- Of particular 
relevance here is the Regularized Follow the Leader (RTFL) algorithm, specified by 


w t+ 1 = argmin V' f k (w) + R t (w) 
wew ' 


( 6 ) 


\k =1 


where Rt(w), t > 1 is a sequence of regularization functions. With Rt = 0, the algorithm 
reduces to the basic Follow the Leader (FTL) algorithm, which does not generally lead to 
sub-linear regret, unless additional requirements such as strong convexity are imposed on the 
functions ft (we will revisit the convergence of FTL in Section [4]). For RFTL, we will require 
the following standard convergence result. Recall that a function R(w) over a convex set W is 
called p-strongly convex if R(w) — 9 ||rc || 2 is convex there. 


Proposition 3 Suppose that each function f t is Lischitz-continuous over W, with Lipschitz 
coefficient Lf. Let Rt{w) = ptR{w), where 0 < p t < Pt+i, and the function R : W —> [0, R max ] 
is Lipschitz continuous with coefficient Lr, and is 1-strongly convex. Then, 

Regret r (RFTL) <2 L f ^ Lf + Pt ~ l)LR + p T R max . (7) 

^ p t + p t -1 

The last bound can be established along the lines of Theorem 2.11 in US!. which considers the 
case of fixed regularization parameters, pt = po■ The proof is outlined in the Appendix. 


3 OCO-Based Approachability 

This section presents the proposed OCO-based approachability algorithm. We start by intro¬ 
ducing the support function and some of its properties, and expressing Blackwell’s separation 
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condition in terms of this function. We continue to present the proposed meta-algorithm that 
employs a generic OCO algorithm, and then provide as an example the specific algorithm that 
is obtained when Online Gradient Descent is used as the OCO algorithm. 


3.1 The Support Function 


Let set S C be a closed and convex set. The support function hs : — >• M U { 00 } of S is 

defined as 

hs(w) = sup(?n, s), w £ U. d . 
seS 

It it is evident that hs is a convex function (as a pointwise supremum over linear functions), and 
is positive homogeneous: hs(aw) = ahs(w ) for a > 0. Furthermore, the Euclidean distance 
from a point r to S can be expressed as 


d(r, S ) = max {(w, r) — hs(w)} 
weB 2 


( 8 ) 


where B 2 is the closed Euclidean unit ball (see, e.g., [5], Section 8.1.3; this equality may be 
readily verified using the minimax theorem). It follows that 


argmax{(rc, r) — hs(w)} = 
wGB 2 


0 : r G S 

us(r ) : r S 


(9) 


with us(r) as defined in Theorem [Tj namely the unit vector pointing to r from Proj >s (r). 
Blackwell’s separation condition in Q can now be written in terms of the support function, as 

( w,r(x,j )) < sup (w,s) = h s (w). 
seS 


We thus obtain the following Corollary to Theorem [T] 


Corollary 4 A closed and convex set S is approachable if and only if for every vector w € B 2 
there exists x G A (I) so that 


(w,r(x,j)} - h s (w ) < 0, Vj e J. (10) 

Note that the last condition can be written as val(tr • r) < hs(w), where 

val (w • r) = min max(w, r(x, j )), 
xgA(i) jeJ 

the minimax value of the game with the scalar payoff that is obtained by projection the reward 
vectors r(i,j) onto w. Consequently, a mixed action x that satisfies (fTOjl can be computed as 
the minimax strategy for the agent in this game. 
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3.2 The General Algorithm 


The proposed algorithm builds on the following idea. First, we employ an OCO algorithm to 
generate a sequence of steering vectors wt € B 2 , so that 
T 

X ((w t ,r t ) - h s (w t )) > T max {( w,r T ) ~ h s (w)} - a(T), (11) 

where rt = r(xt,jt) is considered an arbitrary vector that is revealed after wt is specified, and 
a(T) = o(T). Next, given wt, we choose xt that satisfies (fTUl) . so that ( wt,r t ) — hs(wt) < 0. 
Using this inequality in (ITT)) . and observing the distance formula dHJ) , yields 

,. a(T) 

d(r t ,S)<-^-^0. 


To secure m, observe that the function f(w;r) = —( w,r ) + hs{w) is convex in w for each 
vector r. Therefore, an OCO algorithm can be applied to the sequence of convex functions 
ft(w) = —( w,rt ) + hs(w ), where rt = r(xt,jt ) is considered an arbitrary vector which is 
revealed only after wt is specified. Applying an OCO algorithm A with Regret r (A) < a(T) to 
this setup, we obtain a sequence (w t ) such that 

T T 


t=i 


< min y2ft( w ) + a ( T ) > 
w£B 2 f 


where 

T T 

X M w t) = - X«^» _ hs ( Wt )) ’ 

t =1 t =1 

T T 

X = “ X^ W ’“ h s( w )) = - T (( u '> f T) - h s (w )) . 

t=l t=l 


This clearly implies (1111) . 


The discussion above leads to the following approachability meta-algorithm. 


Algorithm 1 (Approachability Meta-Algorithm Based on OCO) 

Given: A closed, convex and approachable set S; a procedure (e.g., a linear program) to 
compute x, for a given vector w, so that Col) is satisfied; an OCO algorithm A for the functions 
f t {w) = -( w t ,r t ) + h s (w), with Regret T (A) < a(T). 

Repeat for t = 1,2,...: 


1. Obtain wt from the OCO algorithm applied to the convex functions fk(w) = —{w,rk} + 
hk(w), k < t — 1, so that inequality CD is satisfied. 



2. Choose xt according to (USD, so that (wt,r(xt,j)) — hs(wt) < 0 holds for all j E J. 

3. Observe Nature’s action jt, and set rt = r(xt,jt)- 


Proposition 5 For the algorithm above. 


d{r T ,S) < 


a(T) 

T 


is satisfied for all T > 1 and any sequence (ji,j 2 , • • •) of Nature’s actions. 


Proof: As observed above, application of the OCO algorithm implies dill) , so that 


d (rx, S) = max {(w, rx) — hs(w)} 
WGB2 


< j; ^2((wt,rt) - h s (wt)) + 

t =l 


a(T) 

T 


□ 

To recap, any OCO algorithm that guarantees (fTTI) with —> 0, induces an approachability 

strategy with rate of convergence 


Remark 1 (Convex Cones) The approachability algorithm developed in m starts with a 
target sets S that are restricted to be convex cones. For S a closed convex cone, the support 
function is given by 


h s (w) 


0 : w € S° 

oo : w <fL S° 


where S° is the polar cone of S. The required inequality in (ED therefore reduces to 


T 



-a(T). 


The sequence ( wt ) can be obtained in this case by applying an online linear optimization algo¬ 
rithm restricted to wt E B 2 O S°. This is the algorithm proposed in W- 

The extension to general convex sets is handled there by lifting the problem to a (d + 1)- 
dimensional space, with payoff vector r'(x,y ) = (n,r(x,y)) and target set S' = cone({fc} x S), 
where k = max. sG s ||s||, for which it holds that d (u,S) < 2d (u',S'). For further details see fjf. 
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3.3 An OGD-based Approachability Algorithm 

As a concrete example, let us apply the Online Gradient Descent algorithm specified in Q to 
our problem. With W = B 2 and ft(w) = —((w,rt) — hs(w)), we obtain in step 1 of Algorithm 

m 

w t+ i = Proj B2 {w t + r)t{r t - yt)} , 2 It € dh s (w t ). 

Observe that Proj S 2 (n) = v/ max{l, ||u||}, and (e.g., Corollary 8.25 in |H]) 

dhs(w) = argmax(s, w) . 
seS 

To evaluate the convergence rate in (f5|) . observe that diam(i? 2 ) = 2, and, since yt G S , 
lift|| = \Vt-yt\\ < 117^-511, where 7^ = {r(a;,y)} a:e A(/),j/eA(j) is the reward set. Assuming for 
the moment that the goal set S is bounded, we obtain 

d(fr, S) < , with b(r]) = - + 2r]\\TZ - S\\ 2 . 

VT V 

For r] = y/2/\\TZ — 5||, we thus obtain b(rj) = 4\/2||7 Z — 5||. 

If S is not bounded, it can always be intersected with 7 Z (without affecting its approachability), 
yielding \\TZ — S'!) < diam(7?.). This amounts to modifying the choice of yt in the algorithm to 

y t G dh S nn{wt) = argmax(y, w). 

yGSnll 

Alternatively, one may restrict attention (by projection) to vectors wt in the set {re € B 2 : 
hs(w) < 00}, similarly to the case of convex cones mentioned in Remark |T| above; we will not 
go into further details here. 

4 Blackwell’s Algorithm and (R)FTL 

We next examine the relation between Blackwell’s approachability algorithm and the present 
OCO-based framework. We first show that Blackwell’s algorithm coincides with OCO-based 
approachability when FTL is used as the OCO algorithm. We use this equivalence to establish 
fast (logarithmic) convergence rates for Blackwell’s algorithm when the target set S has a 
smooth boundary. Interestingly, this equivalence does not provide a convergence result for 
general convex sets. To complete the picture, we show that Blackwell’s algorithm can more 
generally be obtained via a regularized version of FTL, which leads to an alternative proof of 
convergence of the algorithm in the general case. 
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4.1 Blackwell’s algorithm as FTL 


Recall Blackwell’s algorithm as specified in Theorem [1] namely xt+i is chosen as a mixed action 
that satisfies © for u = usift). 


Lemma 6 For ft(w) = —(w,rt) + hs(w), 

t 

argmin E fk{w) 

wGESz k=1 


us(r t ) : r t & S 
0 : ft e S 


Proof: Observe that fk(w) = —t((w,rt) — h$(w)), so that 

t 

argmin E f k (w) = argma x.{(w,r t ) - h s (w)} . 

w£B 2 w£B 2 

The required equality now follows from (jUJ). □ 

Comparing to (0, with Rt = 0, it may be seen that the sequence of projection directions 
us(rt) in Blackwell’s algorithm coincides with the sequence (wt) that is obtained by applying 
the FTL algorithm to the functions (ft) over w € Bo. It follows that Blackwell’s algorithm is 
identical to Algorithm |T] with this choice of the OCO algorithm. 


To establish convergence of Blackwell’s algorithm via this equivalence, one needs to show that 
FTL guarantees the regret bound in dill) for an arbitrary reward sequence (rt) C 1Z, with a 
sublinear rate sequence a(T). It is well know, however, that (unregularized) FTL does not 
guarantee sublinear regret, without some additional assumptions on the function f t . A simple 
counter-example, reformulated to the present case, is devised as follows: Let S = {0} C M, so 
that hs(w ) = 0, and suppose that r\ = —1 and r* = 2(— 1 )* for t > 1 . Since wt = sign(ft_i) 
and sign(rt) = — sign(ft_i), we obtain that ft(wt ) = —rtWt = 1, leading to a linearly-increasing 
regret. 


The failure of FTL in this example is clearly due to the fast changes in the predictors wt■ We 
now add some smoothness assumptions on the set S that can mitigate such abrupt changes. 


Assumption 1 Let S be a compact and convex set. Suppose that the boundary dS of S is 
smooth with curvature bounded by ko, namely: 

||n(si) — n(s 2 )|| < aco|| si — S 21 | for all s\,S 2 £dS, ( 12 ) 

where n(s ) is the unique unit outer normal to S at s € dS. 

For example, for a closed Euclidean ball of radius p, © is satisfied with equality for kq = p 1 . 
The assumed smoothness property may in fact be formulated in terms of an interior sphere 
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condition: For any point in s € S there exists a ball B(p) C S with radius p = k 0 1 such that 
a e B{p). 

Proposition 7 Let Assumption Q] hold. Consider Blackwell’s algorithm as specified in The- 
oremUl and denote wt = us (ft- i) (with w± arbitrary). Then, for any time T > 1 such that 
fr 0 S, (TTTT) holds with 

a(T) = Co(l + InT), (13) 

where Co = diam(77) \\TZ — 5|| kq, C\ = \\1Z — 5||, and ln(-) is the natural logarithm. Conse¬ 
quently, 

d(f T ,S)<C 0 ^p^, T> 1. (14) 

Proof: See the Appendix. 

The last result establishes a fast convergence rate (of order log T/T) for Blackwell’s approach- 
ability algorithm, under the assumed smoothness of the target set. We observe that in the 
stochastic version of the algorithm, which is based on the rewards r(it,jt ) rather than r(xt,jt), 
the convergence is still of order T -1 / 2 due to the added stochastic effect (unless all mixed 
actions xt happen to be pure). We also note that logarithmic convergence rates for OCO 
algorithms were derived in [10] . under strong convexity conditions on the function fi. Finally, 
conditions for fast approachability (of order T _1 ) were derived in [13], but are of different 
nature than the above. 

4.2 Blackwell’s algorithm as RFTL 

The smoothness requirement in Assumption [T] precludes such important target sets as poly- 
hedra and cones. As observed above, in absence of such additional smoothness properties the 
interpretation of Blackwell’s algorithm through an FTL scheme does not imply its convergence, 
as the regret of FTL (and the corresponding bound a(T ) in (1111) 1 might increase linearly in 
general. 

To address the general case, we show next that the Blackwell’s algorithm can be identified more 
generally with a regularized version of FTL. This algorithm does guarantee an 0(\/T) regret 
in m , and consequently leads to the standard 0(T 1 / 2 ) rate of convergence of Blackwell’s 
approachability algorithm. 

Our starting point is the following observation: 

Lemma 8 For fk(w) = ~(w,rfi) + hs(uf), 1 < k <t, and any p t > 0, 

A • r f ( \ , Pt ,| i|2\ / Ptus(f t ) : 

wt+i = argmm { > f k (w) + — \\w\\ } = < 

W £B 2 ^ 4 [ 0 : 


ft<£S 
f t € S 
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(15) 



where fit = min{l, ^-d (r t , S')} > 0 . 

Proof: Recall that Y^k =1 fk(w) = —t((w,ft) — hs(w)), so that 

t 

argmin { VVfcO) + ^r\\w\\ 2 } = argma x{(w,f t ) - h s {w ) - ^ \\w \\ 2 } . 
w£B 2 h—\ W w£B 2 

To compute the right-hand side, we first maximize over {w : ||u;|| = f3}, and then optimize over 
fi € [0,1]. Denote r = ft, and rj = pt/t. Similarly to Lemma 0 

argmax{(u;,r) — h$(w ) — flirt’ll 2 } = argmax{(w, r) — hs(w)} = 

||iy||= ^ ||lp||=/3 

Now, for r 0 S, 

max {(w,r) - h s (w ) - ^|H| 2 } = fid (r, S) - ^/? 2 . 

Maximizing the latter over 0 < fi < 1 gives /3* = minjl, }. Substituting back r and r/ 
gives (fT5l) . □ 

Equation (11511 defines an RFTL algorithm with quadratic regularization. When used for the 
OCO part in Algorithm |TJ the resulting scheme turns out to be equivalent to Blackwell’s 
algorithm. Indeed, the minimum in (1151) is attained by the same unit vector us(ft) that 
appears in Theorem [lj scaled by a positive constant. That scaling does not affect the choice 
of xt according to m , as the support function hs(w) is positive homogeneous. However, this 
scaling does induce sublinear-regret for the OLO algorithm, and consequently convergence of 
the approachability algorithm. This is summarized as follows. 

Proposition 9 Let S be a convex and compact set. Consider the RTFL algorithm specified in 
equation CSD, with pt = py/t, p > 0. The regret of this algorithm is bounded by 

2L 2 2L 2 

Regret T (RTFL) < (—- + p)VT -|-- + L f ln(4T — 3) = a 0 (T), 

P P 

where Lf = \\TZ — 5||. Consequently, if this RTFL algorithm is used in step 1 of Algorithm [7j 
to provide wt, we obtain 

d {f T ,S)<^p- = 0(T-h), T > 1 . (16) 

Proof: The regret bound follows from the one in Proposition [3l evaluated for ft(w) = 
— (rt,w ) + hs(s), W = L> 2 ; R(w) = ||w;|| 2 , and pt = poVt- Recalling that dft(w) = — r* + 
argmax seS (u;, s), the Lipschitz constant of ft is upper bounded by ||TZ — 5|| = Lf. Further¬ 
more, R m ax = 1 and Lr = 2. Therefore, 

R egr et T (RTFL) < 2 L, £ L -1 + pV f . 


fiusfr) : r fL S 
0 : r € S 
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Upper bounding the sums with corresponding integrals gives the stated regret bound. The 
second part now follows directly from Proposition [5j □ 

With p = \JlLf, we obtain in (1161) the convergence rate 


d (r T , S ) < 


2V2\\K-S\\ 

Vt 


+ °(~ 7 =) 

VT 


We emphasize that the algorithm discussed in this section is equivalent to Blackwell’s al¬ 
gorithm, hence its convergence is well known. The proof of convergence here is certainly 
not the simplest, nor does it lead to the best constants in the convergence rate. Indeed, 
Blackwell’s proof (which recursively bounds the square distance d(fT, •S') 2 ) leads to the bound 
d(fT,S) < Rather, our main purpose here was to provide an alternative view and 

analysis of Blackwell’s algorithm, which rely on a standard OCO algorithm. Nonetheless, the 
logarithmic convergence rate that was obtained under the smoothness Assumption |T] appears 
to be new. 
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Appendix 

Proof of Proposition [3} We follow the outline of the proof of Lemma 2.10 in [15], modified 
to accommodate a non-constant regularization sequence pt ■ The starting point is the inequality, 
proved by induction, 

T T 

£(/*K) - M u )) ^ - ft(wt+1 )) + Pt.R{u ), (17) 

t=1 t=1 
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which holds for any u E W. Therefore, 


T 


T 


£(/*K)-/*(«))<£/£ II W t - tct + i|| + p t R{u). 
t =1 t=i 


( 18 ) 


Denote F t {w) = Ylk=i fk( w ) + Pt-iR{w). Then is /9j_i-strongly convex, and wt is its 
maximizer by definition. Hence, it holds generally that 

F t (u) > F t (w t ) + ^-\\u - w t \\ 2 , 

and in particular, 

F t (w t + 1) > F t (w t ) + ^Y~\\wt + i - w t || 2 , (19) 

F t+ i(wt) > F t+ i(w t+ 1 ) + yl|w t - •Wt+ill 2 • (20) 

Summing and cancelling terms, we obtain 

/t(wt) - /t(w*+i) + (Pt ~ p t -i)(R(w t ) - R(w t+ 1 ) > Pt + ^ t ~ 1 IN+i - ™t\\ 2 • 

But the left-hand side is upper-bounded by (Lj + (p t — p t -i)Lji)\\w t+ i — 1 1, which implies 

that 

N+i -w t \\< 2—---. 

Pt + Pt -1 

Substituting in (fl8l) gives the bound stated in the Proposition. □ 

Proof of Proposition [3 We first observe that the regret bound in (1131) implies (1141) . Indeed, 

for ?t 0 S, d (fT,S) < a(T)/T follows as in Proposition [SJ while if vt E S then d(fT,S) = 0 

and (1141) holds trivially. 

We proceed to establish the logarithmic regret bound in (fl3l) . Let f t (w ) = —( w,rt ) + hs(w), 
W = B‘2 . and denote 

T T T 

Regret T (/i :T ) = V ft{w t ) - min V f t (w) = YV/tCwt) - f t (w T + 1 )) • (21) 

z —' we w zz ' 

t= l t =i t =l 

A standard induction argument (e.g., Lemma 2.1 in (15j ) verifies that 

T T 

^2(ft(w t ) - f t (u )) < ^(/ t K) - f t {w t+ 1)) (22) 

t=i t=i 

holds for any u E W, and in particular for u = uit+i- It remains to upper-bound the differences 
in the last sum. 

Consider first the case where ft ^ S for all 1 < t < T. We first show that \\wt — w4+i|| is small, 
which implies the same for | f t (wt) — ft{wt+ 1)|- By its definition, wt+i = us(ft), the unit vector 
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pointing to r t from q = Proj^rj), which clearly coincides with the outer unit normal n(c t ) to 
S at ct . It follows that 

II W t - W t+ ill = ||n(ci_i) - n(ct)\\ < K 0 \\c t -i - c t \\ < Ko\\ft-i - f*|| , 

where the first inequality follows by Assumption[T| and the second due to the shrinking property 
of the projection. Substituting f t = ft~\ + \{r t — ft- 1 ) obtains 

\\w t - Wt+ 1|| < y || ft ~ ft- 1|| < -y diam (^) ' ( 23 ) 

Next, observe that for any pair of unit vectors w\ and W 2 , 

ft(m) ~ ftiw 2 ) = -(wi - w 2 ,r t ) + h s (wi) - h s (w 2 ) 

= —{w\ — W 2 ,rt) + max{wi, s) — max('U’ 2 , s) 

sGS s£S 

< -(wi - w 2 , r t ) + (wi,si) - (w 2 , Si) 

= (wi - w 2 ,si - r t ) < 11^1 - W 2 WWTI - Sj| , 

where s\ £ S attains the first maximum. Since the same bound holds for ft(w 2 ) ~ ft{‘<vi), it 
holds also for the absolute value. In particular, 

\ft(wt) ~ ft(w t+ i)\ < IK-'iUt+iHH'fc-Sjl, (24) 

and together with (1231) we obtain 

\ftiw t ) - ft(w t+ 1 )| < y diam (U) \\K - S\\ = ^ ■ 

Substituting in (|22D and summing over t _1 yields the regret bound 

Regret T (/i :T ) < C 0 ( 1 + InT). (25) 

We next extend this bound to case where f t £ S for some t. In that case w t + 1 = 0, and 
wt — wt +1 may not be small. However, since ft( 0) = 0, such terms will not affect the sum in 
(1221) . Recall that we need to establish (1131) for T such that br t 0 S. In that case, any time t 
for which ft € S' is follows by some time m < T with f m 0 S. Let l<fc<m<Tbe indices 
such that f'k-, - ■ ■ fm -1 € S , but fk-\ ^ S (or k = 1) and f m S. Then ..., w m = 0, and 

m 

Z(Mwt) - ft(w t + 1 ) = fk(Wk) - fm(w m+ 1 ) . 

t=k 

Proceeding as above, we obtain similarly to (|23l) . 

m— 1 

| \w k - Wm +lll < Ko\\fk-i - fm\\ < diam(^) ^2 — , 

t=k 

and the regret bound in (1251) may be obtained as above. □ 
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